In contrast to point pattern based methods, we can view the location of cells or spots as a fixed lattice and measure the corresponding marker expression at each location. HTS-based spatial transcriptomic technologies often produce data on a regular lattice (i.e., approximately evenly spaced spots or beads of uniform size and shape), whereas imaging-based technologies yield irregular lattice structures (i.e., with variable cell sizes and shapes, and non-uniform spacing).
For this representation of the cells, we will rely on the SpatialFeatureExperiment package (see below for more details). For preprocessing of the dataset, we refer the reader to the vignette of the Voyager package (Moses et al. 2023). The Voyager package also provides wrapper functions around the package spdep(Pebesma and Bivand 2023) that work directly on the SpatialFeatureExperiment object. The package spdep is designed for the analysis of spatial data with lattice structure.
We will load a dataset generated by (McKellar et al. 2021) using the Visium technology (Ståhl et al. 2016). The data shows a sample taken from the tibialis anterior muscle of a mouse.
Show the code
# Load the datasetsfe <- SFEData::McKellarMuscleData(dataset ="full")# Take spots that are covered with tissuesfe_tissue <- sfe[, colData(sfe)$in_tissue]# Filter out genes with no countssfe_tissue <- sfe_tissue[rowSums(counts(sfe_tissue)) >0, ]# Convert counts log-transformed normalized expression valuessfe_tissue <- scater::logNormCounts(sfe_tissue)
Lattice data refers to spatial data collected at locations arranged in a regular or irregular grid (lattice). Each location has a defined spatial unit, and the sampling locations are fixed rather than random. This approach contrasts with point pattern analysis, where we assume that the locations were generated by a stochastic process (Zuur, Ieno, and Smith 2007; Pebesma and Bivand 2023).
As can be seen below, the Visium dataset is a regular lattice. The color shows the number of counts detected in each Visium spot. In contrast, the outlines of individual cells after segmentation (e.g., from a higher resolution spatial omics assay) could be seen as an irregular lattice. As this dataset also contains (manual) segmentations of myofibers (muscle cells stored as myofiber_simplified) we will illustrate calculations based on irregular lattice on the myofiber segmentations.
Show the code
# A plot using the plotSpatialFeature from Voyagerp <-plotSpatialFeature(sfe_tissue,"nCounts",annotGeometryName ="myofiber_simplified")# This extracts the segmented cellscells <-annotGeometry(sfe_tissue, "myofiber_simplified") |>st_geometry()# We can also use ggplot and geom_sf to plot sf objectsq <-ggplot() +geom_sf(data = cells, fill =NA) +theme_void()# Using `patchwork` to combine the plotsp | q
Spatial weight matrix
In lattice data analysis, a key concept is the spatial weight matrix, which models the spatial relationships between units in the lattice (i.e., spots or cells). Various methods exist for constructing this matrix, such as contiguity-based (direct neighbors), graph-based, or distance-based methods. (Getis 2009; Zuur, Ieno, and Smith 2007; Pebesma and Bivand 2023). The documentation of the package spdep gives an overview of the different methods.
For Visium, the most straightforward way is to take the direct neighbours of each spot. This is done using the function findVisiumGraph.
In an irregular lattice, the task of finding a spatial weight matrix is more complex, as different options exist. One option is to base the neighbourhood graph on neighbours that are in direct contact with each other (contiguous), as implemented in the poly2nb method.
Show the code
annotGraph(sfe_tissue, "myofiber_poly2nb") <-findSpatialNeighbors(sfe_tissue,type ="myofiber_simplified",MARGIN =3,method ="poly2nb", # wraps spdep function with same namezero.policy =TRUE )
Alternatively, we could take the five nearest neighbours of each cell.
Show the code
annotGraph(sfe_tissue, "knn5") <-findSpatialNeighbors(sfe_tissue,type ="myofiber_simplified",MARGIN =3, # to use the annotation geometrymethod ="knearneigh", # wraps the spdep function with the same namek =5,zero.policy =TRUE )
As we can see below, the graphs look quite distinct. On the left side, in the contiguous neigbhbour graph (neighbours in direct contact) we can notice the formation of patches, while in the knn graph isolated patches are interconnected.
Show the code
p1 + p2
Spatial autocorrelation
Spatial autocorrelation measures the association between spatial units while recognizing that the spatial units are not independent measurements. These measures can be global (summarizing the entire field) or local (providing statistics for individual locations) (Pebesma and Bivand 2023).
Global Measures
Global methods give us an overview over the entire field-of-view and summarize the spatial autocorrelation metric to a single value. The metrics are a function of the weight matrix and the variables of interest. The variables of interest can be gene expression, intensity of a marker or the area of a cell. The global measures can be seen as a weighted average of the local metric, as explained below.
In general, a global spatial autocorrelation measure has the form of a double sum over all locations \(i,j\)
\[\sum_i \sum_j f(x_i,x_j) w_{ij}\]
where \(f(x_i,x_j)\) is the measure of association between features of interest and \(w_{ij}\) scales the relationship by a spatial weight as defined in the weight matrix \(W\)(Zuur, Ieno, and Smith 2007; Pebesma and Bivand 2023).
Moran’s I
Moran’s I is the most prominent measure of spatial autocorrelation. The values are bounded by \(-1\) and \(1\). The expected value is close to \(0\) for large \(n\), the exact value is given by \(\mathbb{E}(I) = -1/(n-1)\). A value higher than the expected value indicates spatial auto-correlation. Negative values indicate negative auto-correlation. Spatial auto-correlation means that similar values tend to be found together in the tissue. In fact, Moran’s I can be interpreted as the Pearson correlation between the value at location \(i\) and the averages value of the neigbours of \(i\), (neighbours as defined in the weight matrix \(W\)) (Moran 1950).
In the first example we will calculate Moran’s I for the number of counts and genes measured in the Visium dataset. First, we have a look at the distribution by eye.
Here we can see the Moran’s I values for the number of counts and genes that were identified in the Visium dataset and a permutation-based null distribution. As we can see, the Moran’s I values both indicate positive spatial autocorrelation and are highly significant. This means that regions with similar count and number of measured gene values tend to cluster together. This is an interesting finding and was observed in other spatial transcriptomic datasets, see (Bhuva et al. 2024).
Local Measures for Univariate Data
Often a global measure is not enough. One number determining e.g. the spatial autocorrelation over an entire tissue slice might not be reflective of tissue heterogeneity. Therefore, local indicators of spatial associations have been developed (Pebesma and Bivand 2023).
Local Moran’s I
Local Moran’s I provides a measure of spatial autocorrelation at each location, highlighting local clusters of similarity or dissimilarity (Anselin 1995). It is defined as:
where the index \(i\) refers to the location for which the measure is calculated. The interpretation is analogous to the global Moran’s I where a value of \(I_i\) higher than \(\mathbb{E}(I) = -1/(n-1)\) indicates spatial auto-correlation; smaller values indicate negative auto-correlation. It is important to note that, as for the global counterpart, the value of local Moran’s I could be a result from both the high or low end of the values. Here we will calculate the local Moran’s I value for the measurement of muscle fiber marker gene Myh2:
Show the code
sfe_tissue <-runUnivariate(sfe_tissue,type ="localmoran",features ="Myh2",colGraphName ="visium",swap_rownames ="symbol")# plot the expression valuespExp <-plotSpatialFeature(sfe_tissue,features =c("Myh2"), colGeometryName ="spotPoly",swap_rownames ="symbol")# plot the local Moran's I valuespLi <-plotLocalResult(sfe_tissue, "localmoran",features =c("Myh2"),colGeometryName ="spotPoly", swap_rownames ="symbol",divergent =TRUE, diverge_center =0) +labs(fill ="li(Myh2)") # specify legend# plot the local Moran's I p-valuespPval <-plotLocalResult(sfe_tissue, "localmoran",features =c("Myh2"), "-log10p_adj",colGeometryName ="spotPoly", swap_rownames ="symbol",divergent =TRUE, diverge_center =-log10(0.05)) +labs(fill ="-log10(p.adj)") # specify legend# please note that thepExp + pLi + pPval
In the local version of Moran’s I, the interpretation is the same as the global version. When interpreting local autocorrelation measures, it is important to consider both the effect size estimates and the significance level. Since the significance level is calculated for each spot separately, it is recommended to adjust for multiple testing. By default, runUnivariate uses the Benjamini & Hochberg correction. The local Moran’s I statistics reveal locations in the tissue that have similar values to their neighbours (c.f., the lower part of the tissue).
Multivariate measures – Bivariate Moran’s I
There exists a bivariate version of Moran’s I as well. The package spatialDM(Li et al. 2023) uses an adapted version of bivariate Moran’s I to identify ligand-receptor pairs.
For two continous observations the global bivariate Moran’s I is defined as (Wartenberg 1985; Bivand 2022)
where \(a_i\) and \(b_i\) are the two variables of interest and \(w_{ij}\) is the value of the spatial weights matrix for positions \(i\) and \(j\). It is a measure of the correlation of the variables \(a\) with the the average of the neighboring values for variable \(b\) (also called the spatial lag of \(b\)).
There exists a local version that we will explore here. Note that the results are not symmetric, but very similar to each other.
Show the code
sfe_tissue <-runBivariate(sfe_tissue, "localmoran_bv", # wraps the method from spdepc("Myh1", "Myh2"),swap_rownames ="symbol", nsim =1000)# this command gives all results of localmoran_bvlocalResultFeatures(sfe_tissue, "localmoran_bv")
Please note that the result might overestimate the spatial autocorrelation of the variables due to the inherent (non-spatial) correlation of \(x\) and \(y\)(Bivand 2022).
Impact of neighbourhood on autocorrelation measures
Let’s compare the impact of different spatial weight matrices on local autocorrelation analysis. In this Visium dataset we do not have gene expression information for each cell because of the resolution of the spots. We will instead calculate the autocorrelation measures on the area of the segmented cells. This could be helpful when looking at local cell densities in a tissue.
First, we base the spatial weight matrix on contiguous neighbours.
What happens if we base the spatial weight matrix on 10 nearest neighbours?
Show the code
annotGraph(sfe_tissue, "knn10") <-findSpatialNeighbors(sfe_tissue,type ="myofiber_simplified",MARGIN =3, # to use the annotation geometrymethod ="knearneigh", # wraps the spdep function with the same namek =10,zero.policy =TRUE )sfe_tissue <-annotGeometryUnivariate(sfe_tissue, "localmoran", "area",annotGeometryName ="myofiber_simplified",annotGraphName ="knn10",include_self =FALSE, zero.policy =TRUE,name ="knn10")pKnn10 <-plotLocalResult(sfe_tissue, "knn10", "area",annotGeometryName ="myofiber_simplified",annotGraphName ="knn10",divergent =TRUE, diverge_center =0) +labs(title ="Local Moran's I (li)\n10 Nearest Neighbours", fill ="li(area)")
Show the code
pPoly + pKnn5 + pKnn10
The overall pattern of spatial autocorrelation is similar across the different weight matrices. However, locally there are some differences, as well as in the scale of the autocorrelation (note that the same color in the different plots does not correspond to the same value between the plots). There is a trend of smoothing of the value, the larger the neighbourhood in consideration.
Summary and Considerations
Lattice data refers to spatial data collected at fixed locations arranged in regular or irregular grids, contrasting with stochastic point pattern analysis.
Regular lattices have uniform, evenly spaced spots, while irregular lattices have variable sizes and shapes with non-uniform spacing.
The spatial weight matrix models spatial relationships between lattice units. Different methods exist to define the spatial weight matrix.
Global spatial autocorrelation measures, like Moran’s I, summarize spatial correlation over the entire field, while local measures identify local clusters of similarity or dissimilarity.
Bhuva, Dharmesh D., Chin Wee Tan, Agus Salim, Claire Marceaux, Marie A. Pickering, Jinjin Chen, Malvika Kharbanda, et al. 2024. “Library Size Confounds Biology in Spatial Transcriptomics Data.”Genome Biology 25 (1): 99. https://doi.org/10.1186/s13059-024-03241-7.
Bivand, Roger. 2022. “R Packages for Analyzing Spatial Data: A Comparative Case Study with Areal Data.”Geographical Analysis 54 (3): 488–518. https://doi.org/10.1111/gean.12319.
Li, Zhuoxuan, Tianjie Wang, Pentao Liu, and Yuanhua Huang. 2023. “SpatialDM for Rapid Identification of Spatially Co-Expressed Ligand–Receptor and Revealing Cell–Cell Communication Patterns.”Nature Communications 14 (1): 3995. https://doi.org/10.1038/s41467-023-39608-w.
McKellar, David W., Lauren D. Walter, Leo T. Song, Madhav Mantri, Michael F. Z. Wang, Iwijn De Vlaminck, and Benjamin D. Cosgrove. 2021. “Large-Scale Integration of Single-Cell Transcriptomic Data Captures Transitional Progenitor States in Mouse Skeletal Muscle Regeneration.”Communications Biology 4 (1): 1–12. https://doi.org/10.1038/s42003-021-02810-x.
Moran, P. A. P. 1950. “Notes on Continuous Stochastic Phenomena.”Biometrika 37 (1/2): 17–23. https://doi.org/10.2307/2332142.
Moses, Lambda, Pétur Helgi Einarsson, Kayla C. Jackson, Laura Luebbert, Ali Sina Booeshaghi, Sindri Emmanúel Antonsson, Nicolas Bray, Páll Melsted, and Lior Pachter. 2023. “Voyager: Exploratory Single-Cell Genomics Data Analysis with Geospatial Statistics.” bioRxiv. https://doi.org/10.1101/2023.07.20.549945.
Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With Applications in R. 1st ed. New York: Chapman and Hall/CRC. https://doi.org/10.1201/9780429459016.
Righelli, Dario, Lukas M Weber, Helena L Crowell, Brenda Pardo, Leonardo Collado-Torres, Shila Ghazanfar, Aaron T L Lun, Stephanie C Hicks, and Davide Risso. 2022. “SpatialExperiment: Infrastructure for Spatially-Resolved Transcriptomics Data in R Using Bioconductor.”Bioinformatics 38 (11): 3128–31. https://doi.org/10.1093/bioinformatics/btac299.
Ståhl, Patrik L., Fredrik Salmén, Sanja Vickovic, Anna Lundmark, José Fernández Navarro, Jens Magnusson, Stefania Giacomello, et al. 2016. “Visualization and Analysis of Gene Expression in Tissue Sections by Spatial Transcriptomics.”Science 353 (6294): 78–82. https://doi.org/10.1126/science.aaf2403.